Expert networks with mixed continuous and categorical feature variables: a location modeling approach

نویسنده

Geoffrey J McLachlan

چکیده

In the context of medically relevant artificial intelligence, many real-world problems involve both continuous and categorical feature variables. When the data are mixed mode, the assumption of multivariate Gaussian distributions for the gating network of normalized Gaussian (NG) expert networks, such as NG mixture of experts (NGME), becomes invalid. An independence model has been studied to handle mixed feature data within the framework of NG expert networks. This method is based on the NAIVE assumption that the categorical variables are independent of each other and of the continuous variables. While this method performs surprisingly well in practice as a way of handling problems with mixed feature variables, the independence assumption is likely to be unrealistic for many practical problems. In this chapter, we investigate a dependence model which allows for some dependence between the categorical and continuous variables by adopting a location modeling approach. We show how the expectation-maximization (EM) algorithm can still be adopted to train the location NG expert networks via the maximum likelihood (ML) approach. With the location model, the categorical variables are uniquely transformed to a single multinomial random variable with cells of distinct patterns (locations). Any associations between the original categorical variables are then converted into relationships among the resulting multinomial cell probabilities. In practice, the dependence model approach becomes intractable when the multinomial distribution replacing the categorical variables has many cells and/or there are many continuous feature variables. An efficient procedure is developed to determine the correlation structure between the categorical and continuous variables in order to minimize the number of parameters in the dependence model. The method is applied to classify cancer patients on the basis of continuous gene-expression-profile vector of tumour samples and categorical variables of patient’s clinical characteristics. The proposed methodologies would have wide application in various scientific fields such as economy, biomedical and health sciences, and many others, where data with mixed feature variables are collected. Further extensions of the methodologies to other NG networks and/or to other members of the exponential family of densities for the local output density are discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Categorical fracture orientation modeling: applied to an Iranian oil field

Fracture orientation is a prominent factor in determining the reservoir fluid flow direction in a formation because fractures are the major paths through which fluid flow occurs. Hence, a true modeling of orientation leads to a reliable prediction of fluid flow. Traditionally, various distributions are used for orientation modeling in fracture networks. Although they offer a fairly suitable est...

متن کامل

Maximum trimmed likelihood estimator for multivariate mixed continuous and categorical data

Abstract In this article we apply the maximum trimmed likelihood (MTL) approach (Hadi and Luceño 1997) to obtain the robust estimators of multivariate location and shape, especially for data mixed with continuous and categorical variables. The forward search algorithm (Atkinson 1994) is adapted to compute the proposed MTL estimates. A simulation study shows that the proposed estimator outperfor...

متن کامل

Comparison of the decision tree, artificial neural network, and linear regression methods based on the number and types of independent variables and sample size

In this article, the performance of data mining and statistical techniques was empirically compared while varying the number of independent variables, the types of independent variables, the number of classes of the independent variables, and the sample size. Our study employed 60 simulated examples, with artificial neural networks and decision trees as the data mining techniques, and linear re...

متن کامل

Using Bayesian Networks and Simulation for Data Fusion and Risk Analysis

Bayesian networks (BNs) were pioneered to solve problems in Artificial Intelligence (AI) and have proven successful in “intelligent” applications such as medical expert systems, speech recognition, and fault diagnosis. In practical terms, one of the major benefits from using BNs is in that probabilistic and causal relationships among variables are represented and executed as graphs and can thus...

متن کامل

FORCED WATER MAIN DESIGN MIXED ANT COLONY OPTIMIZATION

Most real world engineering design problems, such as cross-country water mains, include combinations of continuous, discrete, and binary value decision variables. Very often, the binary decision variables associate with the presence and/or absence of some nominated alternatives or project’s components. This study extends an existing continuous Ant Colony Optimization (ACO) algorithm to simultan...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Expert networks with mixed continuous and categorical feature variables: a location modeling approach

نویسنده

چکیده

منابع مشابه

Categorical fracture orientation modeling: applied to an Iranian oil field

Maximum trimmed likelihood estimator for multivariate mixed continuous and categorical data

Comparison of the decision tree, artificial neural network, and linear regression methods based on the number and types of independent variables and sample size

Using Bayesian Networks and Simulation for Data Fusion and Risk Analysis

FORCED WATER MAIN DESIGN MIXED ANT COLONY OPTIMIZATION

عنوان ژورنال:

اشتراک گذاری